Scaling and memory in recurrence intervals of Internet traffic 
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By studying the statistics of recurrence intervals, r, between volatilities of Internet traffic rate 
changes exceeding a certain threshold q, we find that the probability distribution functions, Pqij), 
for both byte and packet flows, show scaling property as Pq{T) = =/( = ). The scaling functions for 
both byte and packet flows obeys the same stretching exponential form, f{x) = Aexp^—Bx'^), with 
/3 ~ 0.45. In addition, we detect a strong memory effect that a short (or long) recurrence interval 
tends to be followed by another short (or long) one. The detrended fluctuation analysis further 
demonstrates the presence of long-term correlation in recurrence intervals. 

PACS numbers: 89.75.-k, 89.75.-Da, 89.20.-Hh, 05.40.-a 



Many complex systems are characterized by heavy- 
tailed distributions, such as power-law distributions 
lognormal distributions 0, and stretched exponential 
distributions jsj. These distributions imply a nontriv- 
ial probability of the occurrences of the extreme events. 
Statistical laws on these extreme events provides evi- 
dence for the understanding of the mechanism that un- 
derlies the dynamical behaviors of the corresponding 
complex systems. Recently, some typical complex sys- 
tems, such as earthquakes 0, H, @, 0, [l], financial mar- 
kets 0, [13, [O, [13] and many other natural hazards , 
have been widely investigated. Taking the earthquakes 
for example, excluding the well established Omori Law 
and Gutenherg-Richter Law the scaling law for 
temporal and spatial variability of earthquakes have been 
observed by Bak et al. Q and Corral [a, H 0| > ^^d the 
memory effect in the occurrence of the earthquakes is re- 
vealed by showing the statistics of the recurrence times 
above a certain magnitude Q. 

The Internet has been viewed as a typical complex 
system that evolves in time through the addition and 
removal of nodes and links, and empirical evidence has 
demonstrated its small-world and scale-free structural 
properties [l^ Ii3|- One of the research focuses, the In- 
ternet traffic, has been widely studied by computer scien- 
tists, physicists and beyond. For instance, Leland et al. 
first found the self-similar nature and long-range depen- 
dence of Ethernet traffic that have serious implications 
for the design, congestion control, and analysis of com- 
puter communication networks [18]. After that, several 
traffic models are proposed to understand the underly- 
ing mechanism for information transport and congestion 
control of the Internet traffic [H,[23,|2l[, especially, those 
models (see Refs. [H, [20] about the models and the Ref. 
[2l| about the time series analysis) can, to some extent, 
reproduce the self-similar nature of the Internet traffic, 
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FIG. f : (Color online) The density flows of both bytes and 
packets in second resolution. Between red lines, the clustering 
behaviors are observed. 



which indicates the existence of burstiness of traffic and 
the large volatility of traffic rate changes. 

Herein we are interested in the large volatility that 
implies the suddenly drastic changes of traffic rate. In 
previous studies, by analyzing a set of time series data of 
round-trip time, Abe and Suzuki [12, [11] reported that 
the drastic changes, named of Internet quakes, are char- 
acterized with the Omori Law and Gutenberg-Richter 
Law. By statistical analysis on recurrence interval r 
between the volatility of traffic rate changes exceed- 
ing a certain threshold q, this Letter reports that: (i) 
the probability distribution functions (pdfs) Pqir) for 
both byte and packet flows, rescaled by the mean re- 
currence interval r, yield scaling property that the scal- 
ing function f(x) follows a stretching exponential form 
f{x) = Aexp{-Bx^) with (3 « 0.45 for aU data; (li) 
a short/long recurrence interval tends to be followed by 
another short /long recurrence interval, implying a strong 
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FIG. 2: Illustration of recurrence intervals, Tq, of nor- 
malized volatility time series with q = 1 and q = 2. 
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FIG. 4: (Color online) (a) and (b): Distributions of recur- 
rence intervals between two consecutive volatilities of traffic 
rate changes above some thresholds, (c) and (d): Rescaled 
distributions Pq(r)r vs. rescaled recurrence intervals =. The 
solid lines denote a stretching exponential function f{x) = 
AexTp{~Bx'^) with /3 « 0.45. (e) and (f): Rescaled distribu- 
tions Pg(T)r vs. = for shuffled data, which obey an exponen- 
tial form f{x) — Aex-p{—Bx). The rescaled distributions 
of real data have heavier tails than the ones of shuffled 
data. 



FIG. 3: Number of intervals vs. the threshold q in 
linear-log plots. Both of the two fitting lines are of 
slopes 0.474. 



memory effect. 

The data used in this paper are part of Ethernet traffic 
set coUected at Bellcore. They correspond to one "nor- 
mal" hour's worth of traffic, collected every 10 miUisec- 
onds, hence resulting in a time series with a length of 
360000. There are two types of measurements, recording 
the number of bytes and packets per unit time, respec- 
tively. The data and information can be found at the 
Internet traffic archive [23| . These data have been widely 



used and become the most important benchmark data in 
relevant areas. We firstly integrate the time series into 
a second resolution, that is, each integrated data point 
is an average of 100 original points. As shown in Fig. 
1, the time series exhibit clustering phenomenon that is 
resulted from the traffic congestion in the Internet. For 
both byte and packet flows, we use the absolute value of 
changes, |AS'i| = \Si — Si^i\ where Si denotes the data 
point at time i, to quantify the volatility. It has been 
demonstrated that the pdf of ASi decays in an asymp- 
totic power-law form and the volatilities are long-term 
correlated [1^ . We then normalize the volatility time se- 
ries by the standard deviation ((lA^ip) - (IA^jI)^)!/^. 
In this way, the threshold q are in units of the standard 
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FIG. 5: (Color online) Original distribution (a) and the 
rescaled distribution (b) of recurrence intervals be- 
tween two consecutive volatilities for packet flow. 
Here, the range of threshold q is much larger than 
that in the Figure 4. The case for byte flow is almost 
the same, thus it is omitted here. 



deviation of the volatility. An illustration of recurrence 
interval r is shown in Fig. 2. 

For the normalized data, the smaller value of 
threshold, q, does not suggest the recurrence 
interval between the large volatilities (extreme 
events), therefore we concentrate on the cases 
with q > 1. As shown in Fig. 3, for both byte 
and packet flows, the number of recurrence inter- 
vals decays exponentially fast as the increasing of 
q. For large q, the results are inaccurate and in- 
credible for the low statistics, and thus we mainly 
discuss the statistics for a very limited range of q. 
Figure 4(a) and 4(b) present the behaviors of the pdfs, 
Pqir), for both byte and packet flows with different q, 
which are obviously broader than the Poisson distribu- 
tions as for uncorrelated data, and the pdf for larger q 
decays slower than that for smaller q. To understand how 
Pqir) depends on q, in Fig. 4(c) and 4(d), we show the 
rescaled pdfs, Pg(r)r, for byte and packet flows as func- 
tions of the rescaled recurrence intervals =. The data 

r 

collapse to a single curve, indicating a scaling relation 



T T 



(1) 



which suggests that the scaling function does not directly 
depend on the threshold q but only through t = r(g). 
Furthermore, as shown in Fig. 4(c) and 4(d), the scaling 
functions for both byte and packet flows follow the same 
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FIG. 6: (a) and (b): Typical examples of recurrence intervals 
for byte and packet flows with q = 1. (c) and (d): Same as 
(a) and (b), except that the original volatility time series are 
shuffled. The horizontal line is used to indicate the cluster of 
large recurrence intervals. 



stretching exponential form, f{x) — Aeyip{—Bx^) with 
/3 « 0.45, indicating a possibly universal scahng property 
in the recurrence intervals of Internet traffic data (see 
also a similar sca ling in the intertrade time in fi- 
nancial markets [26l|V Therefore, one can estimate 
Pq(r) for an arbitrary q with the knowledge of Pq'ir) for 
a certain q' . This scaling property is particularly signifi- 
cant for the understanding of the statistics of large-g case 
where the number of data is usually very small. Simi- 
lar analysis for very large q has also been done, 
as shown in Fig. 5, the rescaled distributions get 
much closer to each other than the original distri- 
butions. However, it is hard to tell whether these 
curves collapse a single master curve since the 
number of intervals for large q is very small. Here- 
inafter, we focus on the statistics for q G [0, 1.375]. 

The stretching exponential distributions of rescaled re- 
currence intervals suggest the existence of correlation of 
volatilities. In contrast, the recurrence intervals for un- 
correlated time series are expected to follow a Poisson 
distribution, as logf{x) ^ —x. To confirm this expec- 
tation, the volatilities are shuffled to remove the corre- 
lations, and the resulting distributions, as shown in Fig. 
4(e) and 4(f), decay in an exponential form, which is 
remarkably different from that of the real time series. 
Furthermore, the very short and very long recurrence in- 
tervals occur more frequently in the real data (see Fig. 
4(c)-4(f)), indicating a burstiness of Internet traffic, sim- 
ilar as observed in many other complex systems [27[. 

The scaling property of Pqir) of recurrence intervals 
only indicates the long-term correlations of volatility time 
series of traffic rate changes, but does not tell if the re- 
currence intervals are themselves correlated. To answer 
this question, we next investigate the memory effect in 
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FIG. 7: Conditional pdfs, Pq(r|ro), of recurrence intervals 
of byte and packet flows for ro in Qi (full symbols) and Qg 
(open symbols) as functions of =. The fitting curves follow 
the stretched exponential forms. 



the recurrence intervals. Before that, we show a typical 
example of recurrence intervals for both byte and packet 
flows in Fig. 6(a) and 6(b), as well as the corresponding 
shuffled sequences in Fig. 6(c) and 6(d). Compared with 
the shuffled ones, the original data exhibit the clustering 
of large intervals, which indeed indicates the memory ef- 
fect that a short (or long) recurrence interval tends to be 
followed by another short (or long) one. 

To quantify the memory effect, we study the condi- 
tional pdf Pg(r|ro), representing the probability a recur- 
rence interval, r, immediately follows a recurrence in- 
terval, tq. If no memory effect exists, Pq{T\To) will be 
identical to Pqir) and independent to tq. Therefore, we 
study Pq(r|ro) not for a specific tq, but for a range of 
To values. Analogous to the analysis of daily volatility 
return intervals , the data set of recurrence intervals 
are sorted in increasing order and divided into eight sub- 
sets, Qi, Q2, • • • , Qs, so that each subset contains 1/8 of 
the total data. It makes the N/ 8 lowest recurrence inter- 
vals belong to Qi, whereas the A^/8 largest ones belong 
to Qs, where N denotes the total number of data points. 
Figure 7(a) and 7(b) show Pg(T|To) for byte and packet 
flows. The distribution corresponding to Qi if obtained 
by recording all the r values (they form a distribution) 
if their predecessor, tq is no less then the smallest inter- 
val in Qi and no more than the largest interval in Qi 
(see Ref. [ll| for more details). As shown in Fig. 7, 
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the rescaled pdfs, Pg(r|ro)r for different q, collapse into 
a single curve that can also be fitted by stretching expo- 



FIG. 8: Long-term memory in recurrence intervals of byte 
and packet flows. The full and open symbols represent the 
exponent a for the real data and the shuffled data, respec- 
tively. The mean value of a for the real data is 0.600 
with mean error 0.009, while for the shuffled data it 
is 0.497 with mean error 0.007. 



nential functions. The remarkable difference between the 
distributions with tq in Qi and Qs clearly demonstrated 
the existence of memory effect. 

To check whether the memory effect is limited only 
to the neighboring recurrence intervals, we use the de- 
trended fiuctuation analysis (DFA), which is a bench- 
mark met hod to quantify long-term correlations (see 
Ref. [28*1 for the original method, as well as Ref. 
[29] and Ref. [30] for the effects of trends and non- 
stationarities, respectively). The fluctuation F{1) of 
a time series and window of / seconds, computed by 
DFA, follows a power-law relation as F{1) ~ P. Un- 
correlated time series corresponds to a = 0.5, while the 
larger (or smaller) a indicates long-term correlation (or 
anti-correlation). Figure 8 shows the values of a for 
recurrence intervals, which are all larger than 0.5 
and of which mean value is 0.600 with mean error 
0.009, indicating the presence of long-term corre- 
lations in recurrence intervals. Furthermore, for 
the shuffled recurrence intervals, the long-term 
correlations are absent with a ~ 0.5 (mean value 
is 0.497 with mean error 0.007). 

In summary, we have investigated the scaling and 
memory properties in recurrence intervals of the Inter- 
net traffic. The empirical pdfs Pqir) for byte and packet 
flows, respectively, can fall into a single curve by rescal- 
ing with the mean recurrence intervals r, as shown in 
Eq. (1). The scaling function has a stretching exponen- 
tial form, as f{x) = Aexp{—Bx^) with (3 « 0.45 for both 
byte and packet flows. This scaling property can be used 
to predict the occurrence probability of on rare events 
that correspond to large q. We also detected the mem- 
ory effect that a short (or long) recurrence interval tends 
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to be followed by another short (or long) one, which is 
further demonstrated by the empirical results that the 
conditional pdf, Pg(r|ro), is strongly dependent on tq. 
Further more, by using the DFA method, we found that 
the recurrence intervals are indeed long-term correlated. 
Some recently reported empirical studies show that the 
Internet-based human activities exhibit burstiness and 
meniory in temporal statistics, such as the web accessing 
[Sll . [32] and on-line entertainment [1^. All those ac- 
tivities contribute some to the Internet traffic, and thus 
we think the analysis of the burstiness and memory of 
the Internet traffic itself can be considered as a valu- 



able complementary work. More interestingly, the results 
suggest that the Internet shares some common properties 
with other cornplex systems like earthquake and financial 
market [1, [13, El, [Hf , which gives support to the possi- 
bly generic organizing principles governing the dynamics 
of apparently disparate complex systems, as dreamed by 
Goh and Barabasi p7l |. 
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